Skip to content

IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards#438

Merged
arthurpaulino merged 13 commits into
mainfrom
ap/optimize
Jun 12, 2026
Merged

IxVM kernel: cut Aiur FFT cost on reduction- and check-heavy shards#438
arthurpaulino merged 13 commits into
mainfrom
ap/optimize

Conversation

@arthurpaulino

@arthurpaulino arthurpaulino commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Thirteen commits in the Aiur IxVM kernel: ten optimization areas — each
targeting a specific hotspot found by profiling per-function FFT cost on real
shards or single constants of a compiled Init environment — plus a CLI
ergonomics fix for resolving private Lean.Name arguments. No semantics change:
every shard still checks without error, and the full lake test -- --ignored ixvm suite passes (297 tests) at every step.

Cumulative effect on a reduction-heavy shard (Init shard 9, one owned constant,
deep reduction): 45.5G → 16.6G total FFT (−64%). The structural / keyed-lookup
areas (1–5) account for 45.5G → 37.8G; the Nat-reduction area (6) takes it the
rest of the way by collapsing two symbolic-arithmetic blowups from O(n) / O(n²)
to O(1). That same area makes a previously intractable proof class — the UTF-8
codec lemma ByteArray.utf8DecodeChar?_utf8EncodeChar_append, whose checking
spawns deep Nat.add x 0xC0 succ-towers and c >>> 6/12/18 division-towers —
reduce in constant rather than tower-depth cost. The deserialization (7),
memoization (8), substitution (9), and primitive-gauntlet (10) areas then cut
ingress and reduction cost across all consts.

Those last areas make another previously intractable proof checkable:
Vector.extract_append._proof_1 (an omega-backed Vector.extract/++ lemma
whose check is ~50% beta-substitution) went from OOM (>14G RAM) to 56.6G FFT
(area 8, which lets it complete at all) to 38.3G (area 9, −32%) to 28.0G
(area 10, a further −27%) — −50% from the first completing run.

Method

Built a .ixe of the Init closure, profiled and sharded it
(ix profileix shard --shards 64), then ran ix check per shard or per
constant and read the per-function FFT table (--stats-out). Shard 10
(check-heavy: lookups, address compares, deserialize) and shard 9
(reduction-heavy: whnf, substitution, spine collection) exercise different
profiles; small hand-built or hand-picked reproducers (x + D = …, x >>> D = …,
Int.emod_emod_of_dvd) isolate and scale individual blowups.

Aiur cost model behind the changes: FFT ≈ per-function width × row height,
charged at the function's max-arm width on every row. Levers: (a) factor a
wide/rare arm into its own circuit; (b) cut row count via better memoization;
(c) replace linear scans with keyed lookups; (d) keep symbolic primitives
compact so they never materialize a tower.

Areas

1. Cheaper Expr.Share / universe lookups — commit 3774860.
The sharing-table and universe-array lookups walked the list with
list_lookup_u64 (per-step U64 predecessor). Switched to a field index and, for
Expr.Share, to list_drop — which returns the sublist pointer, so repeated
lookups dedup in the memo cache. Shard 10: total FFT 3.89G → 2.11G; the merged
pointer-list walk fell from 1.93G to 5.8M.

2. Hot/cold split of address_eq — commit 69b6936.
The primitive-dispatch gauntlet compares each Const head against ~50 hardcoded
addresses, making address_eq (a full 32-limb compare, width 109) the hottest
circuit. Almost every comparison differs at limb 0. Split it: address_eq
compares limb 0 only and delegates the rest to address_eq_tail, whose height is
just the rare limb-0 matches. (A nested match in the same function does nothing —
Aiur charges full width per row regardless of arm; the split must be a function
boundary.) Shard 10: width 109 → 80, 509.8M → 374.1M; total 2.106G → 1.973G.

3. Hot/cold split of whnf_with_spine — commit 0e30d78.
On reduction-heavy consts this is the single hottest circuit (width 89 × 3.36M
rows), its width set by two arms a minority of rows reach: Const-head and
Proj-head dispatch. Factored both into whnf_const_head / whnf_proj_head. The
Proj arm is the jackpot — 3.4K rows but nearly half the width. Shard 9: width
89 → 34, 6.48G → 2.48G.

4. Non-tail, single-definition collect_spine — commit 5b7fabc.
collect_spine peeled the App chain with a tail-recursive accumulator. Tail
recursion buys nothing in Aiur (stack depth is free), and the accumulator landed
in the memo key, blocking reuse — 0 cache hits across 5.38M rows. Rewrote it
non-tail, keyed on the expression alone (2.19M hits now), and merged the two
verbatim copies into one definition in kernelTypes. Shard 9: 2.65G → 0.96G.

5. Keyed skip-set for sharded check — commit a52967f.
The largest address_eq source on a sharded check was addr_in_list, an O(N)
linear scan of the assumption-leaf list per checked const (1.81M calls on shard
9). Replaced with an RBTreeMap skip-set keyed on the first 4 address bytes:
build once, membership = one rbtree lookup + one confirming address_eq. Shard 9:
address_eq 2.07M rows / 3.48G → 274K rows / 0.40G; total 41.7G → 37.8G.

6. Compact symbolic Nat reduction — commits dbc4177, 5dcab7f, f7cfe23.
Nat primitives applied to (symbolic base, literal) delta-unfolded into towers
the kernel then re-walked: Nat.add x 0xC0succ^192(x), and Nat.div x 2
(what x >>> k unfolds to, k times) → an exploding division algorithm. Three
coordinated commits keep them compact:

  • dbc4177: try_nat_linear_rec reduces Nat.rec base (succ-step) (Lit n) with
    a symbolic base directly to Nat.add base (Lit n) instead of n iota steps.
  • 5dcab7f: whnf (try_nat_offset_stuck) leaves Nat.add base (Lit n) stuck as
    a compact offset; def-eq (nat_offset_of + try_def_eq_nat) decomposes each
    side to base + KLimbs and decides base ≟ base on equal offsets.
  • f7cfe23: generalizes the stuck rule to Nat.div / Nat.mod base (Lit k)
    (k ≥ 2), so symbolic >>> reduces to nested-stuck-div in O(1).

Benchmarks: x + D = Nat.rec x succ-step D goes O(D) → depth-independent
(328M → 28.5M at D=512, −91%); x >>> 18 = (x >>> 9) >>> 9 goes 15.65G → 455M
(−97%), matching x >>> 6. All magnitudes stay KLimbs (no Goldilocks
collapse); only de Bruijn / top-position indices are G.

7. Narrow Ixon expr deserialization — commit fa3447e.
On large-closure consts, ingress deserialization dominates: the two hottest
parsers are get_expr and get_app_telescope, each carrying width set by a
single arm. Split get_expr's Let arm (three sub-get_expr calls + triple
store) into get_expr_let, dropping get_expr width 102 → 75. Changed
get_app_telescope's func: Expr parameter to func: &Expr (load inside),
dropping width 95 → 78. Ingress profile (2000-const UTF-8 closure, run_check
load-only): get_expr 766M → 563M, get_app_telescope 477M → 392M, total
ingress 3.13G → 2.85G.

8. lbr-trimmed context keys for whnf / infer / def-eq memoization — commit
d96c738.
Aiur auto-memoizes whnf / k_infer / k_is_def_eq on the full
(expr, types, …) argument tuple, so the same closed reduction gets a fresh
memo key at every binder depth and misses. Mirror Rust's
whnf_key / infer_key / def_eq_ctx_key (src/ix/kernel/tc.rs): trim types
to the suffix reachable from the expression's loose-bvar range (ctx_trim /
ctx_reachable, a fixpoint that closes over kept frames' own ranges) before the
memoized call. A closed subterm (lbr == 0) keys to the empty context and shares
its result across every depth. Eager substitution is unchanged (matches the
Lean / Rust reference); the win is purely cache keying. Fast paths
(closed → Nil, full-reach → as-is) keep the boundary cheap.
FFT (ix check <const> --stats-out): Nat.add_comm 58.28M → 57.90M (−0.66%);
Int.emod_emod_of_dvd 5.090G → 4.691G (−7.85%, inference / omega-heavy); Init
shard 10 4.738G → 4.721G (−0.37%). Net-positive everywhere measured.

9. Multi-arg beta with a narrow substitution walk — commit 3203691.
whnf_apply_beta substituted one spine argument per expr_inst1, re-walking
the (shrinking) body for each, so a K-argument application (λλλ. body) x y z
walked the body K times. Substitution (expr_inst1 + expr_inst1_walk) was
~53% of FFT on omega-heavy proofs. Mirror Rust's multi-arg beta
(src/ix/kernel/whnf.rs:541-567): peel_beta consumes the whole telescope and
expr_inst_many (= simul_subst) substitutes all consumed args in one walk.
Single-arg betas stay on the cheap expr_inst1 path, so short telescopes don't
pay the list overhead — the reason an earlier unconditional simul_subst
regressed and was reverted. A hot/cold split factors the multi-arg walk's BVar
arm (list_length / list_lookup / expr_lift) into expr_inst_many_bvar,
keeping the hot App/Lam walk narrow (width 82 → ~53, the same pattern as areas
2–3). FFT (ix check <const> --stats-out):
…Vector.extract_append._proof_1 56.64G → 38.31G (−32.4%); Nat.add_comm
57.90M → 56.17M; Init shard 10 4.721G → 4.668G (−1.1%); shard 9 16.346G →
16.314G.

10. Gate the primitive-op gauntlet behind a memoized classifier — commit
87cb8b0.
whnf_const_head ran five try_* dispatchers (nat / str / bitvec / native /
decidable) in sequence on every Const head, each comparing the head
address against its family's primitive addresses — wasted on the common
non-primitive head. Worse, several try_* speculatively WHNF their arguments
(to test for literal / decidable operands), so a non-primitive head still
triggered deep argument reductions that were then thrown away. Factor the chain
into try_address_primitives and gate it behind prim_any_addr: a sum of the 43
head-dispatched address_eq checks, memoized per head address (computed once
per const, not per whnf). A 0 means no try_* can match, so dispatch jumps
straight to projection-definition / delta. Validated by a differential assert
(prim_any_addr must be 1 whenever the gauntlet actually matched) run over the
suite + Int.emod_emod_of_dvd + Vector._proof_1; it caught two missing
addresses (string_utf8_byte_size, punit_size_of_1), then passed across
millions of reductions — assert removed afterward. FFT (ix check <const>):
…Vector.extract_append._proof_1 38.31G → 28.02G (−26.8%, the skipped
speculative arg-WHNF); Int.emod_emod_of_dvd 4.69G → 3.97G (−15%);
Nat.add_comm 56.17M → 56.06M; Init shard 9 16.314G → 16.279G; shard 10
4.668G → 4.659G.

Tooling

Resolve private Lean.Name args in check/prove — commit 6ec124a.
parseName (naive dot-split; an all-digit segment → Name.mkNum) can't rebuild
a private name like _private.Init.Data.Vector.Extract.0.…_proof_1 from its
display string, so ix check / ix prove reported it "not found". Added
toString fallbacks — resolveName (compiled-in env) and resolveIxeAddr
(.ixe env) — that match the arg against each constant's displayed name when the
direct lookup misses. Covers both commands (shared forEachClaim), across the
compiled-in and --ixe paths. Tooling only; no kernel change.

Soundness

  • Areas 1, 3, 4, 7 are structural / refactor rewrites with identical results
    (each shard still checks).
  • Area 2: address_eq returns the same value as a full 32-limb compare — one
    differing limb proves inequality.
  • Area 5: key collisions overwrite, sound because the only effect is a missed
    skip
    (a frontier const gets rechecked, never wrongly trusted); the confirming
    address_eq rules out false positives; the 4-byte key makes collisions
    negligible regardless.
  • Area 6: keeping Nat.add / div / mod symbolic literal stuck is the correct
    normal form (these are irreducible for a symbolic base). try_def_eq_nat
    returns eq = 1 only when offsets are equal and bases are def-eq (+ k is
    injective); otherwise it falls back to the generic path. BAD negative cases
    still get rejected (no false accept), and recursor-on-symbolic-offset reduces
    correctly.
  • Area 8: trimming types to the loose-bvar-reachable suffix changes only the
    memo key, not the computation — a closed expr never consults the trimmed
    frames, and ctx_reachable's fixpoint keeps every frame a kept frame's stored
    type can reach. Eager substitution and all reduction logic are untouched.
  • Area 9: expr_inst_many is the standard simultaneous de Bruijn substitution
    (simul_subst); peel_beta consumes exactly one argument per Lam, so
    (λλλ. body) x y z and three sequential single substitutions produce the same
    term. The hot/cold split only relocates the BVar arm — identical result.
    Eager substitution semantics are unchanged (matches the Lean / Rust reference).
  • Area 10: gating is sound iff prim_any_addr is 1 whenever the gauntlet would
    reduce — skipping it then only elides work that returns "no match". A
    differential assert checked exactly this invariant on the suite plus the two
    heavy consts (millions of reductions across every primitive family) before the
    gate was trusted; the assert caught the only two omissions and then ran clean.
  • Whole suite: the full ixvm suite passes (297 tests) at every commit.

Expr.Share resolution and universe lookups walked the sharing/univ lists
with `list_lookup_u64` (per-step `u64_is_zero` + `relaxed_u64_pred`). Switch
to:

- universe lookups: `list_lookup(univs, flatten_u64(idx))` — cheap per-step
  field subtraction instead of the U64 predecessor.
- Expr.Share (Convert convert_expr + Ingress deref_share):
  `let ListNode.Cons(e, _) = load(list_drop(sharing, flatten_u64(idx)));`.
  list_drop returns the sublist *pointer*; repeated Share lookups collapse
  to shared memo entries, so the walk is near-free.

Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10`
(Init, owned=3, closure=676): total FFT 3.89G -> 2.11G. The merged
pointer-list walk drops from 1.93G (list_lookup) to 5.8M (list_drop, 0.27%).

(Measurement run had in-circuit blake3 verification disabled — not committed.)
The primitive-dispatch gauntlet in whnf compares each Const head against ~50
known primitive addresses, so address_eq is the single hottest circuit. The
full 32-limb compare ran at width 109 on every call.

Split it: `address_eq` now compares limb 0 only and rejects (the common case —
a single differing limb proves inequality), delegating the remaining 31-limb
compare to a separate `address_eq_tail`. Because Aiur charges a function's full
width on every one of its rows, the cold path must be a separate function (a
nested match in address_eq changes nothing — measured identical); as its own
circuit, address_eq_tail's height is only the rare limb-0-match rows.

Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 10`:
- address_eq: width 109 -> 80, FFT 509.8M -> 374.1M
- address_eq_tail (new): width 108, height 1683, FFT 1.9M
- total shard FFT: 2.106G -> 1.973G

Limb-0 is the sweet spot: hot height (259991) dominates tail height (1683), so
pulling a second limb forward (N=2) widens the hot circuit by more than it
saves in the tail (measured 1.976G, worse). Result is identical to a full
32-limb compare (sound).

(Measurement run had in-circuit blake3 verification disabled — not committed.)
whnf_with_spine is the single hottest circuit on reduction-heavy consts: width
89 charged on every one of its 3.36M rows. The width was set by two arms that
only a minority of rows reach — the Const-head dispatch (delta/iota/quot/prim
gauntlet) and the Proj-head reduction.

Factor both into their own functions (`whnf_const_head`, `whnf_proj_head`).
Because Aiur charges a function's full width on every row, the ~76% of steps
that are App/Lam/Let now run at the narrow residual width instead of 89; the
wide dispatch only taxes its own (smaller) row count.

Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`
(1 reduction-heavy owned const):
- whnf_with_spine: width 89 -> 34, FFT 6.48G -> 2.48G
- whnf_const_head (new): width 77, height 821490, FFT 1.24G
- whnf_proj_head (new): width 56, height 3452, FFT 0.002G
- total shard FFT: 45.54G -> 42.78G (-6.1%)

The Proj arm is rare (3452 rows) but was nearly half the width; extracting it
dropped whnf_with_spine 65 -> 34 for ~zero relocated cost. Pure refactor.
Replace the tail-recursive `collect_spine_go(e, acc)` (and its verbatim copy
`collect_spine_simple_go`) with one non-tail `collect_spine(e)` in the
kernelTypes block. Repoint all callers (whnf, primitive, inductive_check,
def_eq); delete both old copies. Keyed on `e` alone, it now dedups shared
sub-spines (0 -> 2.19M cache hits).

Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`:
- collect_spine: 5.38M rows / 2.65G -> 2.17M rows / 0.96G
- list_snoc.Ptr: +0.96G (new)
- total shard FFT: 42.78G -> 41.67G
check_all_skipping tested each closure const against the assumption-leaf list
via addr_in_list, an O(N) linear address_eq scan — the single largest address_eq
source on sharded checks (1.81M calls on Init shard 9, more than all primitive
dispatch combined).

Build an RBTreeMap skip-set once (keyed on the first 4 address bytes), then test
membership with one rbtree lookup + one confirming address_eq. Key collisions
overwrite: sound because a missed skip only rechecks a frontier const, and the
confirming address_eq rules out false positives.

Measured on `lake exe ix check --ixe init.ixe --ixes init.ixes --shard 9`:
- address_eq: 2.07M rows / 3.48G -> 274K rows / 0.40G
- rbtree build + lookups: +~0.46G
- total shard FFT: 41.67G -> 37.84G
try_nat_linear_rec only folded `Nat.rec base (λ_ ih => succ ih) (Lit n)` when
the base was also a literal; a symbolic base declined and fell through to n
iota steps that materialize succ^n(base). Reduce the symbolic-base case
directly to the offset primitive `Nat.add base (Lit n)` instead.

This keeps the value in the same compact `base + n` form a literal already has,
so def-eq comparing it converges instead of descending n unary succ layers (the
UTF-8 `x + 0xC0` reduction class). Offset magnitude stays KLimbs; only the
nat_add top-position index is a (small) G.

Measured on a `x + D = Nat.rec x succ-step D` benchmark: the unary def-eq
descent collapses (nat_succ_of D -> ~constant), ~-15% total FFT at D=512. The
remaining cost is the in-whnf chain reduction (substitution), unchanged here.
Reducing `Nat.add x (Lit n)` (symbolic x) delta-unfolded into a succ^n(x)
tower, and the nat_succ dispatch then whnf-cascaded the whole chain — O(n)
substitution per reduction, the dominant cost on Nat-arithmetic proofs (the
UTF-8 codec class).

Two coordinated changes:
- whnf (try_nat_offset_stuck): leave `Nat.add base (Lit n)` (non-literal base,
  nonzero n) stuck as a compact offset instead of unfolding it.
- def-eq (nat_offset_of + try_def_eq_nat): decompose each side to base + KLimbs
  offset (reading `Nat.add base (Lit m)` in O(1), peeling the few succs whnf
  exposes) and decide `base_a ≟ base_b` only when offsets are equal; otherwise
  fall back. All magnitudes stay KLimbs (no Goldilocks collapse).

Measured on `x + D = Nat.rec x succ-step D` (depths 64..512): total FFT goes
from O(D) (328M at D=512) to a depth-independent ~28.5M (-91% at D=512). Full
`lake test -- --ignored ixvm` passes (297 tests; BAD cases still rejected);
a recursor-on-symbolic-offset completeness test reduces correctly.
`Nat.div x (Lit k)` / `Nat.mod x (Lit k)` with a symbolic base delta-unfolded
the division algorithm and expanded hugely, even though the result is
irreducible for a symbolic base. `Nat.shiftRight x k` unfolds to k nested
`Nat.div _ 2`, so a symbolic `>>>` materialized a deep, super-linear tower —
the dominant cost in the UTF-8 codec's `c >>> 6/12/18` encoding.

Generalize `try_nat_offset_stuck` (which already keeps `Nat.add base (Lit n)`
compact) to also leave `Nat.div`/`Nat.mod base (Lit k)` (k ≥ 2) stuck. Thresholds
keep `x/1 = x` and `x/0 = 0` reducing. Both `x >>> 18` and `(x >>> 9) >>> 9` then
reduce to the same nested-stuck-div form, so def-eq stays structural and the
identity still holds.

Measured on `x >>> 18 = (x >>> 9) >>> 9`: total FFT 15.65G -> 455M (-97%), now
depth-independent (matches `x >>> 6`). No regression on the Nat.add reproducers;
full `lake test -- --ignored ixvm` passes (297 tests).
Ingress deserialization (parsing every closure const's Ixon bytes into Expr
trees) is the dominant cost when checking large-closure constants. Two width
cuts on the hottest parsers:

- get_expr: the Let arm (three recursive get_expr calls) set the function width
  though Let is rare. Factored into get_expr_let — width 102 -> 75.
- get_app_telescope: passed `func` by value (the wide Expr union) on every
  recursion step. Pass it by `&Expr`, loaded only at the base case — width
  95 -> 78.

Measured by profiling ingress of the `ByteArray.utf8DecodeChar?…` closure
(2000 consts): get_expr 766M -> 563M, get_app_telescope 477M -> 392M, total
ingress FFT 3.13G -> 2.85G. Full `lake test -- --ignored ixvm` passes (297).
Key whnf / k_infer / k_is_def_eq memoization on the loose-bvar-reachable
suffix of the local context (ctx_trim / ctx_reachable), mirroring Rust's
whnf_key / infer_key / def_eq_ctx_key. A closed subterm (lbr 0) keys to
the empty context and shares its result across every binder depth; eager
substitution is unchanged. Fast paths (closed -> Nil, full-reach -> as-is)
keep the boundary cheap.

FFT (lake exe ix check <const> --stats-out): Nat.add_comm 58.28M -> 57.90M
(-0.66%); Int.emod_emod_of_dvd 5.090G -> 4.691G (-7.85%); Init shard 10
4.738G -> 4.721G (-0.37%). 297 ixvm tests pass.
parseName can't rebuild a private name (_private.M.0.foo) from its
display string -- the marker / scope-index components don't round-trip
through naive dot-splitting. Add toString fallbacks: resolveName for the
compiled-in env and resolveIxeAddr for a .ixe env, so check and prove
(which share forEachClaim) resolve names like
_private.Init.Data.Vector.Extract.0.Vector.extract_append._proof_1
across both the compiled-in and --ixe paths.
whnf_apply_beta substituted one spine arg per expr_inst1, re-walking the
(shrinking) body for each, so a K-arg application `(λλλ.body) x y z` walked
the body K times. Mirror Rust whnf.rs:541-567: peel the whole telescope
(peel_beta) and substitute all consumed args in ONE expr_inst_many walk
(simul_subst semantics). Single-arg betas stay on the cheap expr_inst1 path,
so short telescopes don't pay the list overhead (why the earlier
unconditional simul_subst regressed). Hot/cold split expr_inst_many_walk's
BVar arm (list_length / list_lookup / expr_lift) into expr_inst_many_bvar so
the hot App/Lam walk stays narrow.

Substitution was 53% of FFT on omega-heavy proofs. FFT (ix check <const>):
_private.…Vector.extract_append._proof_1 56.64G -> 38.31G (-32.4%);
Nat.add_comm 57.90M -> 56.17M; Init shard 10 4.721G -> 4.668G (-1.1%);
shard 9 16.346G -> 16.314G. 297 ixvm tests pass.
whnf_const_head ran 5 try_* dispatchers (nat / str / bitvec / native /
decidable) in sequence on every Const head, each comparing the head address
against its family's primitive addresses — wasted on the common non-primitive
head, and worse, several try_* speculatively WHNF their args (to test for
literals / decidables), cascading into deep reductions that are then thrown
away. Factor the chain into try_address_primitives and gate it behind
prim_any_addr: a memoized (per head address) sum of the 43 head-dispatched
primitive address_eq checks — 1 only for a real primitive, so a 0 skips the
whole gauntlet straight to projection-definition / delta.

Validated by a differential assert (prim_any_addr must be 1 whenever the
gauntlet matched) run over the 297-test suite + Int.emod_emod_of_dvd +
Vector.extract_append._proof_1; it caught two missing addresses
(string_utf8_byte_size, punit_size_of_1), then passed clean across millions of
reductions. Assert removed after validation.

FFT (ix check <const>): _private.…Vector.extract_append._proof_1
38.31G -> 28.02G (-26.8%); Int.emod_emod_of_dvd 4.69G -> 3.97G (-15%);
Nat.add_comm 56.17M -> 56.06M; Init shard 9 16.314G -> 16.279G; shard 10
4.668G -> 4.659G. 297 ixvm tests pass.
@johnchandlerburnham

Copy link
Copy Markdown
Member

Are any of these improvements able to be ported to the Rust kernel implementation?

Also, can we run execution out-of-circuit on Init, Lean & Mathlib?

@arthurpaulino arthurpaulino enabled auto-merge (squash) June 12, 2026 01:09
@arthurpaulino arthurpaulino merged commit 096fe8e into main Jun 12, 2026
14 checks passed
@arthurpaulino arthurpaulino deleted the ap/optimize branch June 12, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants